Handling different geometry types in a GeoPandas.GeoDataFrame

Contents

Handling different geometry types in a GeoPandas.GeoDataFrame#

This notebook will pick up on the error we encountered in an earlier notebook. We show a step-by-step process of troubleshooting and resolving the error.

It looks like we’re getting a warning on the explore() method that is interfering with the functionality that displays feature info when you hover over a feature. Let’s dig into it and see what’s going on. It looks like the issue is being caused by rows of the GeoDataFrame object that have a ‘GeometryCollection’ geometry type. First, I’m going to copy the warning into a cell below. The warning is already in the form of a list of dictionaries, which makes it nice to work with:

Here is the error message separated from the problem geometries:

/home/../python3.11/site-packages/folium/features.py:1102: UserWarning: GeoJsonTooltip is not configured to render for GeoJson GeometryCollection geometries. Please consider reworking these features: ..... to MultiPolygon for full functionality.

And below are the problem geometries identified in the error message saved as a list of dictionaries:

problem_geoms = [
    {
        "rgi_id": "RGI2000-v7.0-G-15-16433",
        "o1region": "15",
        "o2region": "15-03",
        "glims_id": "G095721E29941N",
        "anlys_id": 929520,
        "subm_id": 752,
        "src_date": "2005-09-08T00:00:00",
        "cenlon": 95.7211016152286,
        "cenlat": 29.940902187781784,
        "utm_zone": 46,
        "area_km2": 0.340954350813452,
        "primeclass": 0,
        "conn_lvl": 0,
        "surge_type": 0,
        "term_type": 9,
        "glac_name": None,
        "is_rgi6": 0,
        "termlon": 95.72222864596793,
        "termlat": 29.937137080413784,
        "zmin_m": 4657.792,
        "zmax_m": 5049.5625,
        "zmed_m": 4825.1104,
        "zmean_m": 4839.4185,
        "slope_deg": 23.704372,
        "aspect_deg": 145.20973,
        "aspect_sec": 4,
        "dem_source": "COPDEM30",
        "lmax_m": 891,
    },
    {
        "rgi_id": "RGI2000-v7.0-G-15-12194",
        "o1region": "15",
        "o2region": "15-03",
        "glims_id": "G095869E30315N",
        "anlys_id": 929951,
        "subm_id": 752,
        "src_date": "2005-09-08T00:00:00",
        "cenlon": 95.86889789565677,
        "cenlat": 30.3147685,
        "utm_zone": 46,
        "area_km2": 8.797406997273084,
        "primeclass": 0,
        "conn_lvl": 0,
        "surge_type": 0,
        "term_type": 9,
        "glac_name": None,
        "is_rgi6": 0,
        "termlon": 95.89518363763428,
        "termlat": 30.307036248571297,
        "zmin_m": 4642.1445,
        "zmax_m": 5278.752,
        "zmed_m": 5011.06,
        "zmean_m": 4993.9243,
        "slope_deg": 12.372513,
        "aspect_deg": 81.418945,
        "aspect_sec": 3,
        "dem_source": "COPDEM30",
        "lmax_m": 4994,
    },
    {
        "rgi_id": "RGI2000-v7.0-G-15-11941",
        "o1region": "15",
        "o2region": "15-03",
        "glims_id": "G095301E30377N",
        "anlys_id": 928228,
        "subm_id": 752,
        "src_date": "2007-08-20T00:00:00",
        "cenlon": 95.30071978915663,
        "cenlat": 30.3770025,
        "utm_zone": 46,
        "area_km2": 0.267701958906151,
        "primeclass": 0,
        "conn_lvl": 0,
        "surge_type": 0,
        "term_type": 9,
        "glac_name": None,
        "is_rgi6": 0,
        "termlon": 95.30345982475616,
        "termlat": 30.380097687364806,
        "zmin_m": 5475.784,
        "zmax_m": 5977.979,
        "zmed_m": 5750.727,
        "zmean_m": 5759.621,
        "slope_deg": 41.069595,
        "aspect_deg": 350.3331518173218,
        "aspect_sec": 1,
        "dem_source": "COPDEM30",
        "lmax_m": 807,
    },
]

Setup#

import geopandas as gpd
import pandas as pd
# Read initial gdf
se_asia = gpd.read_parquet("../../itslive/data/vector_data/rgi7_region15_south_asia_east.parquet")

# Read bbox of ITS_LIVE data cube
bbox_dc = gpd.read_file("../../itslive/data/vector_data/bbox_dc.geojson")

# Project the rgi outlines so that its CRS matches the CRS of the bbox
se_asia_prj = se_asia.to_crs(bbox_dc.crs)
assert se_asia_prj.crs == bbox_dc.crs, "CRS of both object do not match."

# Subset the RGI outlines by the bbox
se_asia_subset = gpd.clip(se_asia_prj, bbox_dc)

Troubleshoot#

First, convert the problem_geoms object from a list of dictionary objects to a pandas.DataFrame object:

problem_geoms_df = pd.DataFrame(data=problem_geoms)

Next, make a list of the IDs of the glaciers in this dataframe:

problem_geom_ids = problem_geoms_df["glims_id"].to_list()
problem_geom_ids
['G095721E29941N', 'G095869E30315N', 'G095301E30377N']

Make a geopandas.GeoDataFrame object that is just the above-identified outlines:

problem_geoms_gdf = se_asia_subset.loc[se_asia_subset["glims_id"].isin(problem_geom_ids)]

Check the geometry-type of these outlines and compare them to another outline from the se_asia_subset object that wasn’t flagged:

problem_geoms_gdf.loc[problem_geoms_gdf["glims_id"] == "G095301E30377N"].geom_type
11940    GeometryCollection
dtype: object
se_asia_subset.loc[se_asia_subset["rgi_id"] == "RGI2000-v7.0-G-15-11754"].geom_type
11753    Polygon
dtype: object

Now, the warning we saw above makes more sense. Most features in se_asia_subset have geom_type = Polygon but the flagged features have geom_type= GeometryCollection.

Let’s dig a bit more into these flagged geometries. To do this, use the geopandas method explode() to split multiple geometries into multiple single geometries.

first_flagged_feature = problem_geoms_gdf[problem_geoms_gdf.glims_id == "G095301E30377N"].geometry.explode(
    index_parts=True
)

Printing this object shows that it actually contains a polygon geometry and a linestring geometry:

first_flagged_feature
11940  0    POLYGON Z ((721543.154 3362894.920 0.000, 7215...
       1    LINESTRING Z (721119.337 3362535.315 0.000, 72...
Name: geometry, dtype: geometry

Let’s look at the other two:

second_flagged_feature = problem_geoms_gdf[problem_geoms_gdf.glims_id == "G095869E30315N"].geometry.explode(
    index_parts=True
)
second_flagged_feature
12193  0    POLYGON Z ((774975.604 3356334.670 0.000, 7749...
       1    LINESTRING Z (776713.643 3359150.691 0.000, 77...
Name: geometry, dtype: geometry
third_flagged_feature = problem_geoms_gdf[problem_geoms_gdf.glims_id == "G095721E29941N"].geometry.explode(
    index_parts=True
)
third_flagged_feature
16432  0    POLYGON Z ((762550.755 3315675.227 0.000, 7625...
       1    LINESTRING Z (762974.041 3315372.040 0.000, 76...
Name: geometry, dtype: geometry

Check out some properties of the line geometry objects, such as length:

print(third_flagged_feature[1:].length.iloc[0])
print(second_flagged_feature[1:].length.iloc[0])
print(third_flagged_feature[1:].length.iloc[0])
0.07253059141477144
0.07953751551272183
0.07253059141477144

It looks like all of the linestring objects are very small, possibly artifacts, and don’t need to remain in the dataset. For simplicity, we can remove them from the original object. There are different ways to do this, but here’s one approach:

  1. Identify and remove all features with the GeometryCollection geom_type:

se_asia_subset_polygon = se_asia_subset[
    ~se_asia_subset["geometry"].apply(lambda x: x.geom_type == "GeometryCollection")
]
/home/emmamarshall/miniforge3/envs/geospatial_datacube_book_env/lib/python3.11/site-packages/geopandas/geoseries.py:645: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  result = super().apply(func, convert_dtype=convert_dtype, args=args, **kwargs)
  1. Remove the line geometries from the GeometryCollection features:

se_asia_subset_geom_collection = se_asia_subset[
    se_asia_subset["geometry"].astype(object).apply(lambda x: x.geom_type == "GeometryCollection")
]
/home/emmamarshall/miniforge3/envs/geospatial_datacube_book_env/lib/python3.11/site-packages/geopandas/geoseries.py:645: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  result = super().apply(func, convert_dtype=convert_dtype, args=args, **kwargs)
  1. Make an object that is just the features where geom_type = Polygon:

keep_polygons = se_asia_subset_geom_collection.explode(index_parts=True).loc[
    se_asia_subset_geom_collection.explode(index_parts=True).geom_type == "Polygon"
]
  1. Append the polygons to the se_asia_subset_polygons object:

se_asia_polygons = pd.concat([se_asia_subset_polygon, keep_polygons])

As a sanity check, let’s make sure that we didn’t lose any glacier outline features during all of that:

len(se_asia_subset["rgi_id"]) == len(se_asia_polygons)
True

Great, we know that we have the same number of glaciers that we started with.

Now, let’s try visualizing the outlines with explore() again and seeing if the hover tools work:

se_asia_polygons.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook